Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-cell adoption #517

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented Jul 3, 2024

Split edpm nodes into compute cells by 1:1 mapping it as
dataplane nodesets.

Use edpm_nodes var to describe compuptes for each cell,
instead of static host and ip vars that only used to work for
a single-cell standalone, or multi-node single cell cases.
Also explain EDPM net config requirements in vars.sample, when
it is used outside of ci-framework (local deployments).

Remove edpm_computes vars no longer used after moving stopping
control-plane tripleo services into edpm-ansible

Simplify ENV headers management by collecting in a single place.

Provide a variable to define the source cloud Ironic topology,
for any cells with Ironic services.

Align nova/libvirt and related services ordering in the
lists of services defined in multiple places, with those
specified in VA.

Align the names in the tests to follow the documented steps
to make the corresponding code easy discoverable.

Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.

Stop ovn services only if active, or not missing (like on
the cell controllers)

WIP: Retain host IPs on internalapi network. Without that, edpm-ansible's os-net-config
changes IPs on internalapi, and also breaks connectivity to EDPM hosts for ansible
(which restores after a node reboot).

Depends-On: openstack-k8s-operators/install_yamls#985
Depends-On: https://review.rdoproject.org/r/c/rdo-jobs/+/55827

Jira: #OSPRH-6548

@bogdando bogdando changed the title Multi-cell adoption [WIP] Multi-cell adoption Jul 3, 2024
@bogdando
Copy link
Contributor Author

bogdando commented Jul 8, 2024

The recent revision gives an overview to the approach taken, PTAL.
As long as we need to maintain the docs-as-code here, I'm afraid there would be no a cleaner solution than that.
For the ci-framework and rdo-jobs side of things, which should template all that in, I have WIP as well...
@jistr @SeanMooney

Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#826 is needed.

tests/vars.sample.yaml Outdated Show resolved Hide resolved
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,18b084c576712d289411bfab3a4bfee4b60a3fbf

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,4687df731d7a30007950c91ac21ee931ebfebf8c

@bogdando
Copy link
Contributor Author

Based on feedback from @SeanMooney, we should not shift cells names as I proposed here. We want it instead like this:

  • A single-cell adoption (only default cell exists): rename default to cell1,
  • A multi-cell ( default, cell1, etc. exist) - omit importing the default as there is no compute hosts supported to be there for a multi-cell OSP, hence nothing to adopt from it.
  • Or, a multi-cell ( default, cell1, etc. exist) - omit renaming the default cell, and import as is
  • Or, a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1:
default -> cell4
cell1 -> cell1
cell2 -> cell2
cell3 -> cell3

Implementing either of these is quite challenging given the local requirement to maintain code in tests in the same form as it is documented (meaning shell commands). This sofisticated logic will bring in even more loops and arrays handling into already overcomplicated code proposed in this PR draft.

@jistr @gibizer looking for your ideas on that

@gibizer
Copy link
Contributor

gibizer commented Jul 11, 2024

Based on feedback from @SeanMooney, we should not shift cells names as I proposed here. We want it instead like this:

  • A single-cell adoption (only default cell exists): rename default to cell1,
  • A multi-cell ( default, cell1, etc. exist) - omit importing the default as there is no compute hosts supported to be there for a multi-cell OSP, hence nothing to adopt from it.
  • Or, a multi-cell ( default, cell1, etc. exist) - omit renaming the default cell, and import as is
  • Or, a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1:
default -> cell4
cell1 -> cell1
cell2 -> cell2
cell3 -> cell3

Implementing either of these is quite challenging given the local requirement to maintain code in tests in the same form as it is documented (meaning shell commands). This sofisticated logic will bring in even more loops and arrays handling into already overcomplicated code proposed in this PR draft.

@jistr @gibizer looking for your ideas on that

As nova-operator allows a cell to be named "default" the simplest solution would be your second proposal. Just import the cells as is. This has the benefit also that it will work even if a given customer wrongly attached computes to the default cell.
After GA nova-operator will get the ability to delete cells. So that feature can be used later to delete the "default" cell and therefore get the deployment structurally the same as a greenfield 18 deployment.

@bogdando
Copy link
Contributor Author

bogdando commented Jul 11, 2024

I tend now to implement the last choice: for a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1. This keeps it consistent for single cell and multicell...

/update: See the combined option which allows both renaming or importing as is

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,9541ce7f013b9b35b2cbd681cb30259da1a85157

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,54d110489b8215e014580b8b77b05ce107fd1e04

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,9954245ae2addd169cc80deab137024b7046f30e

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Unable to update github.com/openstack-k8s-operators/install_yamls

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,b946bca930ed67ffe94465e36e742abd9ba55d95

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/89e76ed92090466f87159254088744a4

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph RETRY_LIMIT in 13m 39s
adoption-standalone-to-crc-no-ceph RETRY_LIMIT in 14m 00s
adoption-docs-preview FAILURE in 1m 17s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0d4b84d9382844ae8e34eaa8840fbc47

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 41m 56s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 45m 00s
✔️ adoption-docs-preview SUCCESS in 1m 15s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d679139049a44292a2e67475cc3087b5

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 3h 06m 28s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 17m 26s
✔️ adoption-docs-preview SUCCESS in 1m 16s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/374cfe1f1d05470c88a6e1ff213fb399

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 3h 11m 17s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 16m 05s
✔️ adoption-docs-preview SUCCESS in 1m 17s

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,d69b1bd0651313b5ab4857c59ba17d3205e2455f

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/771cb99c159944af9ce4e72696578412

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 3h 10m 53s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 15m 07s
✔️ adoption-docs-preview SUCCESS in 1m 21s

@bogdando bogdando force-pushed the multi-cell branch 2 times, most recently from 7e2ce97 to 3a7c425 Compare December 20, 2024 10:08
Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#985 is needed.

Provide a static multi-cell config for databases and messaging
for adoption guide and tests, which comprises a 3 cells.

Keep renaming 'default' cell consistent for single and multi cells:

Default becomes cellX (or it can be imported as is, for a multi-cell
case only)
cell1 becomes mapped to openstack-cell1 osdp node set
cell2 becomes mapped to openstack-cell2 osdp node set, etc.
cellX (X=3 here) becomes mapped to openstack-cell3. Alternatively,
default cell retains its name for the openstack-default osdpns
mapping
Evaluate podified MariaDB passwords for cells from osp-secret
to align the tests with documented commands. Remove no longer
needed podified DB password variable.

Make ansible and shell variables compute cells aware.

Rework vars and secrets YAML values for the source and edpm
nodes to not confuse its different naming schemes for cells
in OSP/TripleO and RHOSO.

Remove cached fact for pulled OSP configuration as it can no longer
be generated in a multi-cell setup, where related shell variables
become bash arrays.

Simplify ENV headers management by collecting in a single place.

Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.

Remove source_db_root_password as it is directly evaluated from
tripleo passwords into an env var.

Run mysql commands in individual pods.
Finished pods take time to terminate, avoid errors where
consequent mysql commands failing because the old and new pod use the
same name.

Rename nodesets to openstack-cell1, which is needed for adoption of
remaining multi-cell aware services in a follow up.

Signed-off-by: Bohdan Dobrelia <[email protected]>

Fix

Signed-off-by: Bohdan Dobrelia <[email protected]>
Declare RUN_OVERRIDES before it is used.

Use env vars instead of docs generation conditions to reuse the same
code in tests:
* Add MARIADB_RUN_OVERRIDES to cover all overrides and client annotations
* Add missing definitions for rhoso/ospd namespace specific vars
* Use env TRIPLEO_PASSWORDS for all cases as OSPDo still deploys
  tripleo
* Define and use NAMESPACE (default openstack) instead of
  RHOSO18_NAMESPACE or OSPDO_NAMESPACE. Remove unused rhoso18 ns value
  (only in these guide).

Signed-off-by: Bohdan Dobrelia <[email protected]>
Illustrate how commands in scripts could have comments
that become (almost as is) native ascii docs foot-notes.

When copying code into docs, the minimal adjustments will
be needed, like adding '$' prefix (or '>' for multiline commands).

Signed-off-by: Bohdan Dobrelia <[email protected]>
Those will be added back in a follow up, which completes
the guide and tests for extra cell2 and cell3.

Signed-off-by: Bohdan Dobrelia <[email protected]>
Assume a single cell1 yet.

Remove edpm_computes and computes env var
from tests as it is not multi-cell aware, and should be no longer
needed. The docs still use that env var, it will be removed in
multi-cell adoption follow up, where we also cover EDPM multi-cell
adoption.

This is required as rhe rdo-jobs dependency introduces that
change for edpm_nodes and provides a common base for this and future
multi-cell follow ups.

Signed-off-by: Bohdan Dobrelia <[email protected]>
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4c398840db3e41968efc8792ba4f7afc

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 50m 32s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 48m 20s
✔️ adoption-docs-preview SUCCESS in 1m 20s

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging rdoproject.org/rdo-jobs for 53192,137

bogdando and others added 4 commits December 24, 2024 13:39
Signed-off-by: Bohdan Dobrelia <[email protected]>
Split edpm nodes into compute cells by 1:1 mapping it as
dataplane nodesets.

Use edpm_nodes var to describe compuptes for each cell,
instead of static host and ip vars that only used to work for
a single-cell standalone, or multi-node single cell cases.
Also explain EDPM net config requirements in vars.sample, when
it is used outside of ci-framework (local deployments).

Remove edpm_computes vars no longer used after moving stopping
control-plane tripleo services into edpm-ansible

Simplify ENV headers management by collecting in a single place.

Provide a variable to define the source cloud Ironic topology,
for any cells with Ironic services.

Align nova/libvirt and related services ordering in the
lists of services defined in multiple places, with those
specified in VA.

Align the names in the tests to follow the documented steps
to make the corresponding code easy discoverable.

Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.

Stop ovn services only if active, or not missing (like on
the cell controllers)

Signed-off-by: Bohdan Dobrelia <[email protected]>
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5f19276989c847a58099005ebe196943

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 56m 48s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 08m 20s
✔️ adoption-docs-preview SUCCESS in 1m 25s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check-before-merge/depends-on Don't forget to check depends-on before merging do-not-merge/hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants